o 

(N 



q 

O 



00 



Lower Bounds on Near Neighbor Search via Metric Expansion 

Rina Panigrahy Kunal Talwar 

Microsoft Research Microsoft Research 

rina@microsof t . com kunal@microsof t . com 

Udi Wieder 
Microsoft Research 

uwieder@microsof t . com 



Abstract 



CO . In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric 

space is related to the expansion of the metric space. Given a metric space we look at the graph obtained 
■ry\ ' , by connecting every pair of points within a certain distance r . We then look at various notions of expan- 

sion in this graph relating them to the cell probe complexity of NNS for randomized and deterministic, 
exact and approximate algorithms. For example if the graph has node expansion $ then we show that any 
'^ ' deterministic <-probe data structure for n points must use space S where {St/nY > $. We show similar 

results for randomized algorithms as well. These relationships can be used to derive most of the known 
lower bounds in the well known metric spaces such as li, I2, loo by simply computing their expansion. In 
the process, we strengthen and generalize our previous results 1 18 1. Additionally, we unify the approach 
in [181 and the communication complexity based approach. Our work reduces the problem of proving 
cell probe lower bounds of near neighbor search to computing the appropriate expansion parameter. 
TJ" ' In our results, as in all previous results, the dependence on t is weak; that is, the bound drops 

^^ I exponentially in t. We show a much stronger (tight) time-space tradeoff for the class of dynamic low 

10 ' contention data structures. These are data structures that supports updates in the data set and that do not 

^D \ look up any single cell too often. 



X 



1 Introduction 

In the Nearest Neighbor Problem we are given a data set of n points xi, ...,Xn lying in a metric space V. The 
goal is to preprocess the data set into a data structure such that when given a query point y G F, it is possible 
to recover the data set point which is closest to y by querying the data structure at most t times. The goal is to 
keep both the querying time t and the data structure space m as small as possible. Nearest Neighbor Search 
is a fundamental problem in data structures with numerous applications to web algorithms, computational 
biology, information retrieval, machine learning, etc. As such it has been researched extensively. 

The time space tradeoff of known solutions crucially depend upon the underlying metric space. Natural 
metric spaces include the spaces W^ equipped with the ii or ^2 distance, but other metrics such as l^o, 
edit distance and earth mover distance may also be useful. The known upper bounds exhibit the 'curse 
of dimensionality' :for d dimensional spaces either the space or time complexity is exponential in d. More 
efficient solutions are known when considering approximations e.g. ifTTl . llT3l . lITOl . |[2ll . however, in general 
these algorithms still demonstrate a relatively high complexity. 

There is a substantial body of work on lower bounds covering various metric spaces and parameter 
settings; we discuss the known bounds in Section 11.31 Traditionally, cell probe lower bounds for data 
structures have been shown using communication complexity arguments [ 15 1. Patra§cu and Thorup |fT9ll use 
a direct sum theorem along with the richness technique to obtain lower bounds for deterministic algorithms. 
Andoni, Indyk and Patra§cu [3| showed randomized lower bounds using communication complexity lower 
bounds for Lopsided Set Disjointness. In a previous work lITSl . the authors used a more direct geometric 
argument to show lower bounds for randomized algorithms for the search version of the problem. 

In this work we strengthen and significantly generalize our previous results. We give a common frame- 
work that unifies almost all known cell probe lower bounds for near neighbor search. At one extreme, it 
gives us the communication complexity lower bounds, and implies e.g. the result of [31- At the other ex- 
treme, we get direct data structure lower bounds leading to a strengthening to the decision problem of our 
results in [18|. Our work in fact shows that all near neighbor lower bounds follow from basic expansion 
properties of the metric space. Vertex expansion translates to lower bounds for deterministic data struc- 
tures. Edge expansion can be translated to lower bounds for randomized data structures, and this lets us 
strengthen iTTSl . We also identify a new (to our knowledge) graph parameter that interpolates between ver- 
tex and edge expansion, that we call robust expansion. We show that robust expansions suffices to prove 
NNS lower bounds. Additionally, for random inputs in highly symmetric metrics, robust expansion also 
translates to upper bounds in the cell probe model, that match our lower bounds for constant t. Finally, we 
present a natural conjecture regarding the complexity of approximate near neighbor search and show tight 
bounds for dynamic low contention data structures. 

1.1 Basic Definitions 

The Near Neighbor Problem is parameterized by a number r. As in the Nearest Neighbor Search Problem 
the input to the preprocessing phase is a data set of n points in a metric space. Given a query point y the 
goal is to determine whether the data set contains a point of distance at most r from y. In the approximation 
version (ANNS) the preprocessing phase receives as input also an approximation ratio c. Given a query 
point y the goal is to differentiate between the case where the closest data set point is of distance at most r 
from y, to the case where the closest data set point is of distance at least cr from y. Clearly a lower bound 
for these problems holds also for nearest neighbor search. 

We prove lower bounds for a generalization we call Graphical Neighbor Search (GNS) which we define 
shortly. We then show that lower bounds on GNS imply ANNS lower bounds. In the GNS problem we are 



given an undirected bipartite graph G = {U,V,E) where the data set comes from U and queries come from 
V. For a node u the set N{u) denotes its neighbors in G. In the preprocessing phase we are given a set of 
pairs (xi, 61), ... , (a;„, 6„) where Xi is a vertex in U and bi £ {0, 1}. The goal is to build a data structure 
such that given a node y £ V,if there is a unique i such that y G N{xi) then it is possible to query the data 
structure t times and output bi. If there is no such i or it is not unique any output is considered correct. 

We observe that ANNS reduces to GNS when assuming a query point is at distance at most r from some 
Xi and a least cr from all other Xj . In this case we have the nodes of U and V correspond to the points in 
the metric space, and the set of edges consists of all pairs of nodes at distance at most r. A formal reduction 
is proven in Section |4] where we also show that average instances of ANNS translate to average instances of 
GNS for which our lower bounds hold. The bounds we show depend only on the expansion properties of G. 
We need the following definitions: 

Definition 1.1 (Vertex expansion). Let fi be a probability measure over U and v be a probability measure 
over V. The 5— vertex expansion of the graph with respect to /x, v is defined as 

^^{6) := mm ^' ,\'' . 
AcV,v{A)<5 V^A) 

The vertex-expansion <1>^ is defined as the largest k such that for all 6 < 2^, <1>^((5) > k. 

LetAcV,BcUand6 = u{A). Observe that if E{A, B) = E{A, U) then /i(5) > $^(<5)i/(A). In 
other words $^ {5) bounds the measure of the sets that cover all the edges incident on a set of measure 5. 
The notion of robust expansion relaxes this by requiring B to cover at least a 7-fraction of the edges incident 
on A. This idea is captured in the definition below. For simplicity we assume that V = U and that p and u 
are the uniform distribution and that G is regular. A more subtle definition which takes into account other 
measures is presented in Section [3l 

Definition 1.2 (Robust expansion). G has robust-expansion <l>r(5, 7) ifiA^B C V satisfying \A\ < 
S\V\, \B\ < $((5,7)|A|, it is the case that Ie(av)\ - ^- ^°^^ ^^'^^ ^'■('^' ^) = ^v{^)- 

1.2 Our Contributions 

1.2.1 Bounds for Deterministic Algorithms 

In this section we require that the algorithm always output the correct answer. We show time space tradeoffs 
based on the vertex expansion properties of G. Our lower bounds are in the average case. Given a distribution 
/i over U, a data set is built by sampling n data set points independently from /i. 

Note that in order for the problem to be interesting we must have that N{xi) and N{xj) are likely to be 
disjoint. We thus have the following definition: 

Definition 1.3. A distribution fi over U is said to be strongly independent /or G if 

Pr {N{x) n N{z) / 0} < l/lOOn^. 

Note that if /i is strongly independent and xi, . . . , x,i are sampled independently by fi then with proba- 
bility at least 0.99 N{xi) n N{xj) = for all i ^ j. In the following m denotes the number of cells in the 
data structure and w denotes the word size in bits, t is the number of cell probes used by the algorithm. 



Theorem 1.4. For a given G, let jji,v be probability measures such that jjl is strongly independent, and the 
vertex expansion with respect to /x, v is ^v{')- Then any deterministic algorithm solving GNS must satisfy 
the following inequalities 

fmwt\ 
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These theorems, combined with known isoperimetric inequahties yield most known cell probe lower 
bounds for near neighbor problems, and generalize them to general expanding metrics. To see this consider 
for example the d— dimensional hypercube equipped with the Hamming distance. It is shown in ||T9l . |[T4l 
that any deterministic solution for ANNS with approximation 1/e must satisfy t > de^ / \og{mwd/n). This 
bound can be slightly improved by creating the following GNS instance: Let U and V both equal the set of 
nodes of the hypercube, and let E = {{u,v) : \u — v\i < ed}. Let ji and v be the uniform distribution. 
Chernoff bounds implies that for d = r2(log n), |n — t;|i > 0.49d with overwhelming probability, so (G, ^) 
is a strongly independent instance. A lower bound on this instance of GNS implies a lower bound on ANNS 
with approximation 1/e. 

Now we use known isoperimetric properties: Harper's theorem (see e.g. fE\) implies that there is a 
constant a > 1 such that <!>„ > a*^ '^. Plugging this in ([U we have that t > de^\oga/\og{mwd/n). In 
Section |4] we discuss how to apply these theorems in greater length. 

1.2.2 Bounds for Randomized Algorithms 

Assume that G is regular. Let x and z be vertices drawn uniformly at random, and y be a random neighbor 
of X. We say G has the property of being weakly independent if Pr[y € N{z)] < 7/n for a small enough 
constant 7. 

Theorem 1.5. There exists an absolute constant 7 such that the following holds. Any randomized algo- 
rithm for a weakly independent instance of GNS which is correct with probability at least half (where the 
probability is taken over the sampling of the input and the algorithm), satisfies the following inequalities: 

i^''>M-,'j) (3) 

n m t 

m^w , 1 7, 

>^r-— ,7 (4) 

n m? t 

As an example, we show in Section|4]that for the Hypercube with E = {(n, v) : \u — v\i < [^ — e)d}, 
the robust expansion <l>f.(^, o(l)) > —^7733737. For d = fl,{logn/e^), the weak independence property is 

easy to verify. Plugging this into Equations |4l we conclude that m^*'^ w > n so that m > {—)it^. This 
result was previously shown by [3] for slightly larger d. 

Our framework suggests a natural conjecture on the complexity of approximate near neighbor problems. 
Conjecture 1.6. Any randomized t-probe datastructure for a weakly independent GNS instance must satisfy 

We point out that for some interesting metric spaces such as the Hamming cube and Euclidean space, 
the known upper bound matches the lower bound in the conjecture for a wide range of parameters. We next 
present some evidence in support of this conjecture. 



1.2.3 An Upper Bound 

There are cases where the bounds above are known to be tight when t = 0(1). We show that this is no 
coincidence: In Section|5]we show that if G is symmetric, there is an algorithm in the cell probe model that 
solves random instances of GNS using space that matches the lower bound in equation ([H) for t = 1. 

1.2.4 Dynamic Data Structure 

In the dynamic version of the problem we want the data structure to support the operation of inserting and 
deleting a point in the data-set. Let tu denote the update time. A weaker version of the conjecture is the 
following: 

Conjecture 1.7. For any dynamic randomized t-probe data-structure for weakly independent GNS on n 
points, it holds that tut > <l>r(^, ^)^^^^ 

To see why this conjecture follows from the stronger one, observe that a data structure with update time 
tu uses space mw < ntu after n inserts. We show that this weaker conjecture holds for a restricted family 
of algorithms which we call low contention; i.e., on those where no memory location of the data structure 
is accessed by too many query points (see Section [6] for a formal definition). While this may seem like a 
severe limitation, we remark that known LSH data structures, and our upper bound in Section |5] are in fact 
dynamic and low contention under our definition. 

We show that 

Theorem 1.8. For any low contention, dynamic t-probe datastructure for GNS on n points, the update time 
is at least Q {<^riT, -^)/32t'^). 

Plugging in the expansion of the hypercube, we see that for a wide range of parameters Locality Sensitive 
Hashing is optimal for the class of the low contention dynamic data structures over the hypercube. 

1.3 Related Work 

Most previous papers are concerned with the Hamming distance over the d-dimensional hypercube. The 
cases of exact or deterministic algorithms were handled in a series of papers [8|, [7|,|14|, [4|. These lower 
bounds hold for any polynomial space. In contrast the known upper bounds are both approximate and ran- 
domized, and with polynomial space can retrieve the output with one query. Chakrabarti and Regev m allow 
for both randomization and approximation, with polynomial space and show a tight bound for the nearest 
neighbor problem. Patra§cu and Thorupl 19 1 showed lower bounds on the query time of near neighbor prob- 
lems with a stronger space restriction (near linear space), although their bound holds for deterministic or 
exact algorithms. The metric i^o is considered in an intriguing paper by Andoni et al. 0] who prove a lower 
bound for deterministic algorithms. The paper uses the richness lemma though the crux of the proof is an 
interesting isoperimetric bound on ^oo for a carefully chosen measure. 

We are aware of only two papers which prove time-space lower bounds for near neighbor problems 
where both randomization and approximation are allowed. 

Andoni, Indyk and Patra§cu |J3l show that for small e > 0, any 0(l)-probe algorithm for (1 + e)- 

approximate near neighbor problem must use space n ^'^' . This bound is tight for small enough e > 
p]. Panigrahy et al. 1.18.1 show that space n^+^*^^) is needed for any algorithm with t queries and e 
approximation, for the search version of the problem. This bound is tight for constant t. 

With the exception of lITSll all previous bounds were proven using communication complexity frame- 
work |fT5l|. and in particular the richness lemma. 



Comparison to lUSll : While there is some overlap in the techniques between this work and lITSl . the 
current work is much more general, and stronger even for the special case (our lower bound now applies 
to the decision version of NNS). We show that expansion may serve as a single explanation that unifies all 
previous results, and also gives a simple recipe to prove lower bounds for other metrics such as £00 and edit 
distance. While lITSl essentially contained a version of the lower bound (O with the edge expansion, we 
are now able to additionally show (|3]). Additionally, we can use vertex expansion to show lower bounds for 
deterministic data structures. Moreover, we show that the randomized lower bounds hold under the much 
weaker notion of robust expansion. As we discuss in Section 11.51 this strengthening is provably needed for 
deriving the right lower bound for the (1 + e)-approximation range for the Hypercube. We remark that both 
(|2]l and dUl hold for communication protocol. While we do not know if ([T]) and Q hold for communication 
protocols, our proofs do shed some light on how the two approaches differ, and make clearer how the data 
structure is used in proving our lower bound. 

Restricted Models: Higher lower bounds may be achieved when considering models which are more 
restricted than the cell probe model. Beame and Vee [5| investigate branching programs. Krauthgamer and 
Lee [12 1 show tight upper and lower bounds for the 'black box model' where the algorithm is only allowed 
to query distances between points of the data set. They show that in this case the complexity of NNS is 
determined by the intrinsic doubling dimension of the data-set. Motwani, Naor and Panigrahy IIT6I prove 
an LSH lower bound for £1, which has recently been strengthened to the tight bound by O'Donnell, Wu and 
Zhou 1 17 1. 

1.4 Notation and Preliminaries 

A data structure for Graph neighbor search is defined as follows. Given a database of n points xi, . . . ,Xn G 
U, and bi, . . . ,bn G {0, 1} the preprocessing algorithm computes a set of t tables Ti, . . . , Tj, where each 
table stores m words of w bits each. We often call each such word a cell of the table. In practice there is only 
one table, but for notational convenience and with out loss of generality we let the data structure construct a 
different table for each query. 

The query algorithm is specified by t lookup functions Fi, . . . ,Ft, where Fi takes in the query point y 
and (z — 1) words of w bits each, and outputs an integer in [m], and function F^: : V x (2"")* — )• {0, 1}. On 
a query y, the data structure looks up ci = Ti[Fi{y)], C2 = T2[F2{y, ci)], . . . , Q = Tt[Ft{y, ci, . . . , ct-i)]. 
Finally it computes F^:{y,ci, . . . , ct). Note that the lookup functions, Fj's and F^ are fixed independent of 
the database, only the tables Ti, . . . , Tj can depend on xi, . . . , xt, 61, . . . , 6(. We say the algorithm is non 
adaptive if the lookup functions are independent of the content of the tables, i.e. of the c values. 

1.5 Overview of Techniques 

The core idea behind our approach is quite simple. We demonstrate it by showing a simple argument that 
the vertex expansion of G provides a lower bound on the space of 1-probe data structures for deterministic 
algorithms. By the definition of vertex expansion, every set of |y|/<l>„ nodes is incident to at least half of 
the nodes of G. Let L be a uniformly random sample of a 1/^v fraction of the cells of the table T, and let 
Q be the set of nodes in V for which the algorithm probes a cell in L. Clearly Q is expected to contain a 
1/^v fraction of the nodes in G. Now consider a sample data set (xi, 61), ... , (x^, 6„) where xi, . . . , x„ 
are randomly sampled nodes in the graph and 61 , . . . , 6„ are random bits. With overwhelming probability at 
least a quarter of the Xj's have a neighbor in the set Q, and thus the random bits associated with these points 



should be retrievable from the contents of L alone. We conclude that the total number of bits in L is at least 
n/4 and thus the space of the data structure is at least n<I>^/4 bits. 

This basic sampling approach for 1-probe data structures can be extended to f-probe data structures in 
two different ways. 

_i 
Cell Sampling: Here we sample a <1>^, * fraction of the cells in each table. Thus a l/$t, fraction of V is 

expected to access only the sampled cells. This immediately gives bound ([Hi for non-adaptive algorithms. 

Path Sampling: Here we sample a path as follows: we pick a cell randomly from the first table so that a 
^ fraction of the vertices Qi lookup this cell. Then we sample a cell from the second table in such a way 
that a ^ fraction of Qi looks up this cell in the second read, and so on. This immediately leads to the lower 
bound in Q for non-adaptive algorithms. 

We remark that the path sampling approach actually leads to communication complexity lower bounds 
for the 2-player version of the problem where Alice has the query point and Bob has the database. Any 
t-probe data structure with m cells of w words each implies the existence of a i-round communication 
protocol where Alice sends log m bits, and Bob sends w bits, in each round. A communication protocol has 
more freedom however; unlike in a data structure, where the same table T2 is used to answer any second 
query, in a communication protocol, the message Bob sends in the second round may depend not just on 
the second message from Alice, but also on the first. Path sampling can be immediately translated to a 
"transcript sampling" technique and thus gives lower bounds for communication protocols. There is no 
similarly obvious translation for cell sampling. 

We can extend these ideas and provide lower bounds for adaptive algorithms by observing the following 
two facts. Firstly, for a fixed data structure, the probability over a random data set that the data structure 
succeeds is exponentially small in n. On the other hand, the number of bits read by the sampling procedures 
above is sublinear, thus the number of all possible non-adaptive algorithms is sub exponential. Informally, 
this allows us to do a union bound over all possible values of the bits read. 

In randomized algorithms not all points in N{x) are good query points for x. In particular, the specific 
query point that queries the cells that are sampled may be a point on which the algorithm errs. The notion of 
shattering plays a major role in extending the bound for this case: Given any fixed partitioning Ai, . . . , Am 
of V such that each set is of cardinality 0{\V\/m), a randomly chosen x has (with high probability) the 
property that maxj \N{x) n Ai\ is at most \N{x)\/K, for a K that depends on the edge expansion. In other 
words, N{x) is shattered by the partitioning. Given that the lookup algorithm is correct for a large fraction 
of N{x), shattering suffices to show that the algorithm still gives the right answer for a majority of the points 
in N{x) which can be looked up from the cell sample (or the path sample). 

In order to prove lower bounds for randomized adaptive algorithms we need to combine the ideas out- 
lined in the two previous paragraphs, which requires more work. Intuitively, for every x such the N{x) 
is shattered, and for any fixed subset N'{x) on which the algorithm succeeds, the sampling is very likely 
to recover the correct answer. Moreover, for every collection of bits read, almost all points shatter. While 
it would be tempting to use a union bound at this point, that does not quite work. Informally, there are 
dependencies everywhere: the part of N{xi) that the algorithm gets right depends on all the other Xj's, the 
bits that are read depend on the sampled cells, etc. The proof carefully defines a notion of shattering that 
depends only on the x's and not on N'{x)'s and argues (over the randomness in picking the Xj's) that most 
points get shattered. Separately, we argue that for a point that gets shattered, and for any fixed N'{x), the 
majority answer is correct with high probability (over the sampling procedure alone). 

The notion of edge expansion does not quite suffice: for the hypercube when r = (^ — e)d, for fixed 
partitioning Ai, . . . , Am of V into cells of size |y|/m,, the largest | A^(x) n Aj|/| A^(x)| is likely to be quite 
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large (^ -\), whereas we would need it to be -^ to get the correct bound. The definition of robust 
expansion <!>,. comes to our rescue here. We can show that while the largest \N{x) n Aj| is usually large, the 
large pieces account for a very small fraction of N{x). In fact, after removing a vanishingly small fraction 
of |A^(x)|, every other piece is only about — ^. We show that our lower bound proofs are robust enough to 
handle this weaker notion of shattering. 

While our techniques do not improve the dependency on the query time t, they overcome some of the 
inherent obstacles in the richness method, so for instance, strengthening the isoperimetric bound of ioo 
would imply that the bound in [JJ extends to randomized algorithms as well. 

2 Deterministic Lower bounds 

In this section we prove Theorem 11.41 The analysis of deterministic algorithms involves node expansion and 
does not require shattering. It allows us to demonstrate the techniques of cell sampling and path sampling 
in a simple setting. 

2.1 Cell Sampling 

The following theorem is a restatement of inequality ([I]) 

Theorem 2.1. Let /i be such that (G, fi) is a strongly independent instance, and <!>„ be the vertex expansion 
with respect to jjl, v. Then any deterministic algorithm solving GNS must satisfy ( "** )* > $„. 

Proof. Recall that Tj represents a table with m cells, from which the i'th query reads, and Fj : y — )■ Tj 
denotes the i'th lookup function. We will state a procedure that obtains a set of at most trnj^J cells so 
that at least a Xj^v fraction of the query points only access these cells. We call this procedure cell sampling. 
Note that here this procedure is entirely deterministic. A probabilistic variant is used in the next section. 

Cell sampling procedure: The cells are obtained iteratively in t phases, each phase corresponds to a 
query of the table. In each phase at most mj^-J cells are chosen. The first lookup function Fi induces a 
partition over V . The set L\ is chosen to be the m/<bj cells in Ti that maximize u{F^'^{Li)). In other 
words, the first lookup function partitions V according to its image in Ti. We choose Li to be the cells 
corresponding to the m/<^J largest partitions, as measured by v. Set Qi d V to be those vertices; i.e. 
F^'^{Li). The selection process continues iteratively in a similar manner. Let Li denote the set of cells 
obtained in the i'th phase and let Qi ^ V denote the set of vertices that (if given as a query) access only 
cells in Li, . . . , Lj In the (i + l)'th phase we consider Fj+i and set Lj+i to be the raj^J cells with highest 
measure, where we restrict v to Qi. In other words, when measuring F~^(Lj_|_i) we assign a measure of 

for vertices outside Qi. It is easy to inductively argue that v{Qi) > $i; , so that viQt) > 1/^t) and thus 

Intuitively, the set of cells in Li, . . . , L^ encode half of the 6j bits and therefore must contain ^{n) bits, 
which would imply the lower bound. Of course, Li, . . . ,Lt depends upon the content of the table which 
depends upon the points in the data set. These dependencies can be handled by using a union bound over 
all the possible values of Lf. To see this, fix the values written in the cells Li, . . . ,Lt to some string uj 
and sample the n data set points from U independently according to fi. Let A^ denote the event that when 
the value of the cells Li, ... ,Lt is u:, an algorithm reading lo succeeds in guessing the bi bits for the data 
set points that fall in N{Qt). Note that Qt depends only upon u and that since the procedure of obtaining 
Lt is deterministic, the locations of the cells obtained also depends only on u). Vertex expansion implies 



that fi{N{Qt)) > ^. Also, since jj is strongly independent and the algorithm is assumed to be correct, 
Pr[UtjAt^] > ^. By Chernoff s bound, the probability that less than n/8 points fall in N{Qt) is at most 
2-n/8 iv[ote that the 6j's are chosen independently, therefore, for a fixed table, if n/8 points indeed fall in 
N(Qt), then the probability that the sampled 6j's match the output of the algorithms is 2^"'®. We conclude 
that Pr[^^] < 2""/*^ + 2-"/^. Now, let K = l/$y*. There are 2^"'*"' ways of choosing co, so we must 
have 2i^"/®2^'"*"' > i. We conclude that Kmtw > n/8 which implies the theorem. D 

2.2 Path Sampling 

We now prove Inequality Q. 

Theorem 2.2. Let ^ be strongly independent, and $„ be the vertex expansion with respect to /x, v, then any 
data structure with a deterministic querying algorithm must satisfy "** > <I>^(l/m*). 

Proof. The proof is similar to that of Theorem 12. II with a different choice of parameters. We present it here 
separately because the two approaches diverge in the next section when we deal with the randomized case. 
We use the cell sampling technique to select a set of cells from the tables, only this time, each phase we select 
a single cell from the table (as opposed to selecting m^^v cells in Theorem l2. II) . We call the approach of 
sampling a single cell from each table, path sampling because we sample a single possible "query path" along 
the t tables. We also observe that lower bounds based on this approach imply communication complexity 
lower bounds. 

Now the contents of Li, . . . , Lt are tw bits, and y{Qt) > "i * so that fi{N{Qt)) > <l>j,(m^*)m^*. 
When fixing the bits of Lj to be the string uj, the expected number of data set points that fall in N{Qt) 
is at least n$i,(m~*)m^*. Define A^ as before and recall that we have Pr[uA^] > |. Let Z^^ be the 
number of data set points falling in N{Qt). Chemoff's bound implies that Pt[Z^ < in$„(7n~*)m~*] < 
2^*"(™"*)'""*/8. Since the stringer now encodes Z^ random bits, Pr[A^] < 2-*-("^"')™"V8+2-*"(™"*)™"'/2. 
There are 2*"' ways of choosing ui, so we have 2*"' • 2^*''(™ '"^ ^ ^ 5 which implies the theorem. D 

3 Randomized Lower bounds 

3.1 Preliminaries 

To prove lower bounds for randomized data structure, we will use Yao's minimax theorem, and instead show 
a distribution over instances such that for some constant 6 > 0, any deterministic t-probe data structure that 
succeeds with probability {1 — 5) needs large space. 

We consider the following randomized version of the Graph Neighbor Search (GNS) problem on a 
bipartite graph G = {U, V, E). We are given a set of n tuples (xi, 61), . . . , (a;„, bn), where Xi G U and 
bi G {0, 1} to preprocess into a data structure. Then given a query y ^ V, the query algorithm makes 
t probes into the data structure, and is expected to return bi if Xi is the unique neighbor in G oi y in 
{xi, . . . , Xn} (if there is no unique neighbor, any output is considered valid). 

Let G = ([/, V, E) be a bipartite graph and let e be a probability distribution over E. Let p.{u) = 
e{u, V) = X^^gy e(n, v) be the induced distribution on U, and let u{v) = e(C/, v) be the induced distribution 
on V. For x ^ U,'we denote by v^ the conditional distribution of the endpoints in V of edges incident on u, 
i.e. iy,^{y) = e{x,y)/e{x,V). 

Suppose we have a graph G = {U, V, E), and the distribution e on E. Then (G, e) define a distribution 
over instances of GNS as follows. We select n points xi, . . . ,x„ independently from the distribution ^ 



uniformly at random and pick 61 , . . . , 6„ independendy and uniformly from {0,1}. This defines the database 
distribution. To generate the query, we pick an i G [n] uniformly at random, and sample y independently 



from Vr 



We say the tuple (G, e) satisfies '^-weak independence (WI) if ^^x,zr^^l,yr^uA{v■: z) & E] < ^. In other 
words, WI ensures that with probability (1 — 7), for the instance generated as above, x is indeed the unique 
neighbor in G of y in {xi, . . . , a:„}. 

We next define the notion of expansion that we use. Recall that the vertex expansion of a set ^ C y in 



an unweighted graph G is the ratio 
A are captured by B, i.e. \E{B, A 



-A, where B = N{A) is the smallest set such that all edges incident on 
I = \E{U, A)\. K relaxation of this definition, which we call 7-robust 
expansion, is the ratio \-^ where B is now the smallest set that captures a 7 fraction of the edges incident on 
A, i.e. e{B, A) > je{U, A). The following definition generalizes this notion to weighted bipartite graphs. 



def 

Definition 3.1 (Robust Expansion). The 'y -robust expansion of a set A C V is defined as (j)r{A,^) = 
Let w be an auxiliary weight fiinction on U with X^^g^ w{u) = 1. The j-robust expansion with respect 

def 

to wis defined as (j)"^ {A, -f) = mmBcU:e{B,A)>ie{U,A)'^(^)/^(^)- 

We say that (G, e) has {f3,'j)-robust expansion (ff = (j)f{l3,^) at least K if for every subset A (^ V 
such that v{A) < (3, we have (t)'^{A, 7) > K. 

For intuition, consider the setting where G = {U, V, E) is derived naturally from an undirected graph 
H = {Vh, Eh) by making two copies of Vh and for each edge {u, v) € Eh, placing the edges (tti, ^2) and 
{vi,U2). Formally, [7^ = Vj/ x {1},^^ = V^// x {2}, and Eh = {{{u, 1), {v,2)) e Uq x Vg : {u,v) G 
Eh}- Then for any set A C V, we have ^Jf (A, 1) = w{N{A))/iy{A), which is the vertex expansion of 
Ain H under v, for w = v. Similarly, if a set A has conductance e{A, A'^)/e{A, Vh) at most 1 — 7, then 
e{A, A) > 76(^4, V) so that (p^iA, 7) < w{A)/v{A). A similar correspondence holds for directed graphs. 

We next give some more definitions. 

Definition 3.2 (/3-sparse). A collection Ai, . . . ,Ak of disjoint subsets of V is said to be j3-sparse with 
respect to (G, e) /fmaxj v{Ai) < j3. 

We now recall the notion of strong shattering. 

Definition 3.3 (Strong Shattering). Given (G, e) and a collection Ai,. . . ^A^ of disjoint subsets of V, we 
say the collection {Ai}i i^-strongly shatters a point x ^ V j/maxj Ux^Ai) < -^. 

We shall in fact show our lower bounds using a weaker notion of shattering, which allows a small 
probability mass from i/x to be in Ai's with u^ measure larger than j^. For a real number a, let (a)+ = 
max(a, 0) denote the positive part of x. Note that strong shattering says that each of the z^2;(^i)'s is at most 
-^ so that ^j(^'x(^j) — ■^)'^ is zero. We relax this condition. 

Definition 3.4 (Weak Shattering). Given (G, e) and a collection Ai, . . . ,Ak of disjoint subsets ofV, we say 
the collection {Ai}i (K, 7)-weakly shatters a point x ^V ifYlii{^x{Ai) — -^)'^ < ^v{VJiAi). 

Definition 3.5 {{K, (3, 7)-weak shattering (WS) property). We say a tuple (G, e) satisfies the {K, /3, 'y)-weak 
shattering (WS) property if for every fS-sparse collection Ai, . . . ,Ak of disjoint subsets ofV, 

Pr \Ai, . . . ,Ak (K, j)-weakly shatters x] > 1 — 7. 
10 



We record the following implication of weak shattering. 



Observation 3.6. IfiG^ e) satisfies {K, (3, j)-weak shattering property, then it also satisfies [K/ [4-] , /?', 7)- 



weak shattering for any /3' > 0. 

Proof. For /3' < (3, there is nothing to prove since every /3'-sparse collection is also /3-sparse. For /3' > /3, 
we can arbitrarily break each set Ai into s = [4-] pieces to derive a /^-sparse collection; the shattering 
follows using the identity {J2i=i '^i)^ ^ I^i=i(^i)^- ^ 

Also observe that 

Observation 3.7. Let Ai, . . . ,Ak be a collection of disjoint subsets ofV. Then for any x £ V, there is a 
measure i>x such that (a) u^ dominates Ux, i-S- i^xi^) ^ i^x{^) for all A (^ V, (b) u^i^i) ^ "^/<"' ^^^ 
i G [k], and(c) If x is (K,j)-weakly shattered, then i)x{yjAi) > Ux{yjAi) — ^v{[jAi). 

Note that z> is not necessarily a probability measure. Intuitively, z> is a part of the measure u that gets 
shattered. Such a measure can be obtained by shaving the mass on y that fall in clusters with large Vx mass. 



1 



Proof For each Ai with Vx{Ai) > ^, we set i)x{y) = -[J^^^xiv) for each y G A^. i>x{y) is set Ux{y) 
for the remaining Ai's. The dominance is immediate, and the small loss property follows from the fact that 

i)x{A,i) = ^ for every Ai of the first type so that Vxi^i) - Vx{Ai) = {vx{Ai) - ;^)+. □ 

Finally, we shall use the following simple information-theoretic lemma: 

Lemma 3.8. Let 5 <\, and let Enc : {0, 1}" -> {0, 1}^ and Dec : {0, 1}^ -^ {0, 1}" be functions such 
that \b — Dec{Enc{h))\i < 6n with probability at least ^ when b is drawn at random. Then there exists a 
constant r = r(6) > such that N > rn, where lim^^j^o ^(<^) = 1- 

Proof. Let C C {0, 1}" be a binary error correcting code with positive rate and minimum distance 25 < ^. 
Then for a random v G {0, 1}", C^ = {b £ v + C : \Dec{Enc{b)) — b\i < 6n} has expected size 2*^'" for 
an r = r{S) > 0. Since the minimum distance of C^ is 26n, the values Enc{b) : b £ Cy we. all distinct, 
leading to the claim bound. D 

The lemma can be extended to the setting where the encoder and the decoder share some randomness. 

Corollary 3.9. Let 6 < \, and let Enc : {0, 1}" x {0, 1}^ -^ {0, 1}^ and Dec : {0, 1}^ x {0, 1}^ -^ 
{0,1}" be functions such that Ef,,2[|6 — Dec{Enc{b,z),z)\i\ < {5/2)n. Then there exists a constant 
r = r((5) > such that N > rn, where lim^^^g ''(^) = 1- 

Proof. There must exist a z such that Efe[|6 — Dec{Enc{b,z),z)\i\ < {6/2)n. By Markov's inequality, 
Pr5[|fe - Dec{Enc{b, z), z)\i > 6n] < \. The claim follows. D 

3.2 Main Result 

The main result of this section is the following. 
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Theorem 3.10. There exists an absolute constant 7 such that the following holds. Let (G, e) satisfy "y-weak 
independence (WI). Then for t < n^, any deterministic t-probe data structure for the distribution over GNS 
instances defined by (G, e) that succeeds with probability (1 — 7) must satisfy 

r^r>^r{-,}) (5) 

n m t 

— >*"(4?) <« 

n rrv t 

where w is an arbitrary auxiliary weight function, and t is o{ni). 

The theorem will follow from Lemma [3. Ill and Theorems 13. 12| and |3.22| that we prove next. 

3.3 Expansion to Shattering 

Lemma 3.11 (Expansion implies shattering). Let Ai, . . . , A^ be a (3-sparse collection of disjoint subsets of 
V. Then 

Pr [x is {K.,^) -weakly shattered] > (1 — 7) 

/ori^ = cD,,(/3,^)73/16 

Proof. We will show that if 'PTx^y[x is {K, '^)-weakly shattered] < (1 — 7), then one of the Aj's does 
not expand enough, thus deriving a contradiction. 

Let r] = u{\jAi). Observe that "Exr^^lvxi^-A^i)] = z^(U^j) = rj. Thus by Markov's inequality, 
Pr,^^[z.,(UAi) > f ] < i. 

Suppose that Pr^^^[x is (iC, '^)-weakly shattered] < (1 — 7). Thus for at least ^ fraction of x's 
(drawn from ^), v.j;{uAi) < -2 and yet x is not {K, 7)-weakly shattered. Let B be the set of such x's. In 
other words, the set B satisfies 

• l^{B) > I 

• For each x^ B, Vx{^Ai) < ^. 

• For each x ^ B, Y.ii'^xiAi) - ;^)+ > 77/. 

Construct an weighted graph Hq on Bx[k], where we put an edge between (x, i) with weight ^{x)vx{Ai) 
e{x, Ai) if v^{Ai) > ^. ThuseHo{x,i) < e{x,Ai) sothat eHoiB,i) < v{Ai). 

The (unweighted) degree of each node x G B in Hq is at most '^''\ "■' < 2rjK/^, since each edge 

incident on x in Hq contributes at least -^ to Vxi^^i)- Moreover, by the properties of B above the total 

2 
edge weight in Hq euoiB, [k]) is at least (^)(7?7) = ^. 

Let Hi be the subgraph of Hq formed by deleting all nodes i G [k] such that the total edge weight 

2 2 

eHo (B, i) incident on i is at most ^u{Ai). The total edge weight deleted in the process is at most ^rj so 

2 
that the total edge weight in Hi efj^ {B, [k]) is at least ^. 

Let R C [k] be the set of nodes on the right surviving in Hi. Since each node in B has unweighted 
degree at most 2rjK/^ in Hi, J2ieR''^(^Hi{'i)) < C^vK/l) X]xeB^(^) — '^V^/l- On the other hand. 
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J2ieR ^(^«) is ^t least the total weight e^i {B, R) of edges in Hi which is lower bounded by ^. It follows 
by an averaging argument that there is a node i* € S such that 

2 

LetSj. = Nn^ii*)- Since i* G i?, wehave e(i?i.,^i*) > ^z^(Aj*). Since i/(Aj.) < /3 by assumption, 

4 4 i^(^i*) 7 

which contradicts the definition of K. D 

3.4 Path Sampling 

In this section, we show the following theorem 

Theorem 3.12. There exists a constant 7 > such that the following holds. Let {G,e) satisfy 'j-weak 
independence (WI) and {K, —t, j)-weak shattering property. Then for t < n*, any deterministic t-probe 
data structure for the distribution over GNS instances defined by (G, e) that succeeds with probability (1— 7) 
must use space m at least Q{{nK'^/w)t). 

Proof Sketch: Suppose that a datastructure with m < {j^nK/Af^w) ~ exists that succeeds with prob- 
abihty (1 — 7'^). We use it construct a randomized encoding and decoding algorithm for a random bi- 
nary vector b € {0, 1}". We sample xi, . . . ,x„ from /x, and build the data structure for the database 
{xi,bi), . . . , {xn, bn) to get t tables Ti, . . . ,Tt where each Tj contains m cells of u; bits each. 

We show how to sample s cells from each table, for a suitable s and let Enc{b, z) be the contents of 
those cells (where z is used as the randomness to pick the Xj's and in the sampling process). The decoding 
algorithm essentially takes the majority answer in z/^., restricted to queries that the data structure can answer 
based on the sampled cells, as its guess for 6,. The success of the data structure, and the WI property, imply 
that the answer on i/^. is equal to hi with probability (1 — 7), for most Xj's. We show that the weak shattering 
property is sufficient to guarantee (using Chernoff bounds) that the majority answer on the restriction of v^^ 
is still equal to hi with high probability. For suitably small 7, this violates coroUarv 13.91 

Intuitively, the t lookup functions break G into m* pieces. We could sample s of these pieces. If all 
Xj's were strongly shattered, each piece has little influence on measure in v^i that can be looked up from 
the sample. For large enough s, Chernoff bounds would then imply that the restricted measure has a large 
probability of answering bi as well, completing the proof. 

The proof below, while following the above intuition, is made complicated by several factors. The 
lookup functions are adaptive so that the m* pieces that V breaks into depends on the table contents. Thus the 
shattering itself depends on the table contents sampled, making a one-shot sampling argument untenable. We 
instead give an inductive proof, that handles these dependencies. The weaker shattering assumption forces 
us to slightly change the decoding algorithm, to take a majority under a modified measure. Additionally, the 
pieces may be of different sizes, and we need to break large pieces to ensure sparseness. 

We are now ready to present a detailed proof. 

Proof. We assume the contrary so that for m < {'y^nK/At'^w)~t , there is t-probe space m data structure 
that succeeds with probability 1 — 7'^ on the distribution defined by {G, e). We use this data structure to 
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construct functions Enc and Dec violating corollary 13.91 We use the auxiliary input z as shared randomness 
between Enc and Dec throughout this proof. 

We first sample points xi, . . . ,Xn from ^ and use (xi, 61), . . . , (x^, 6„) as the database. Note that when 
b and z are random, {xi,bi) are distributed according to {G,e), and hence for a random y drawn from 
the appropriate distribution, the data structure will return the right answer for y with probability 1 — 7^^. By 
Markov's inequality with probability (1 — 7), it is the case that except for (1 — 7) of the i's, the data structure 
answers correctly with probability ( 1 — 7) when y is drawn from v^i- By WI, except with probability 7, y has 
a unique neighbor in xi, . . . , x„ so that the correct answer is hi. Thus except for a 7 fraction of the i's, with 
probability (1 — 27), the data structure returns bi on a random y from v-a- Thus the tables T = Ti, . . . ,Tt 
are a valid, albeit long, encoding that can be decoded appropriately by taking the majority answer on u^^ as 
a guess for bi. In the rest of the proof, we argue that in fact a random sample of the tables suffices. 



def 

Sampling Procedure: We will sample s "paths" in the table, where s is set to s = jr^. Further we assume 



Atw " 

that w >2 log m. 

Let Fi = Fi{y), F2 = F2{y,ai), F3 = ^3(7/, 01,02) etc. be adaptive lookup functions that the 
data structure uses. The jth path consists of t cells Aji, . . . , Ajt, one from each of the t tables sampled 
sequentially. We will also get a telescoping sequence of subsets V 5 A^^' 5, • • • , ^ where A^^> denotes 
the set of queries that access the sampled cells at locations A^i, . . . , A^^^^i) in the first k — I tables, for 
some j G [s] . Observe that the cells accessed in T^ depends on the contents of the cells accessed in the 
previous table. 

We first describe how to sample a single path An, . . . , A^. To sample from the first table we look at 
the partition of V into m parts induced by the value of Fi (y) over all y G V. Let A^ , . . . , ^2m ^^ ^ 
^-sparse partition of V that refines {F~^{1) : I G [m]}. Such a partitioning can be obtained by starting 
with {F~^{1) : I G [m]} and repeatedly splitting parts larger than — into smaller pieces. This splitting can 
be done arbitrarily, and results in a ;^-sparse partitioning containing at most 2m parts; we pad this with 

empty parts to get exactly 2?n, sets A\ , . . . , ^2m • ^^^^^ corresponds to a table Ti of size 2?n, where the 
cells corresponding the partitions that were split are replicated appropriately. The first cell of the path is 

simply obtained by picking a random index An into this table. Let A^^^^ = A^ be the sampled part and 
let Cii denote the contents of the corresponding cell. 

Inductively, suppose that we have defined An , . . . , Ai^, cell contents Cn , . . . , Ci^, and set A^^^^ , • • • , ^(ifc) . 
so that all for points in A^^''^ the query algorithm looks up A(n), . . . , A(i;-) in the first k lookups, given the 
contents Cn, . . . , Ci(fc_i). Inductively, we ensure that i/(A(^'^)) < -^. The {k + l)th lookup function, 
given the contents Cn, . . . , Cik partitions the set A^^''^ into m parts, and as above, we can refine this par- 
tition to get a —^ sparse partitioning Af : / G [2m,]. We sample Ai(fc_|_i) uniformly from [2m], and 
denote by Ci(fc+i) the contents of the corresponding cell. We define ^(^C^+i)) to be Aj^ . Clearly 
^i(fc+i) j^^g jj^g desired inductive properties. Continuing in this fashion, we get An, . . . , Au, Cn, . . . , Cu 
and^(ii),...,A("). 

We repeat the above process s times to get s such paths. The matrix Ajk (j G [s], A; G [t]) denotes the 
sampled cell locations for the s paths, the matrix Cjk denotes the contents of these cells, and the sets yl(-?'^) 
denotes the telescoping sequence of subsets for each of the s paths. 

For a technical reason, each entry in the first column of the A matrix is drawn independently without 
replacement from [2m], thus ensuring that this column consists of s random distinct values from [2m]. For 
a matrix U and sets I, J of indices, let U{I, J) denote the submatrix of U indexed by / and J. 

The measure ofVJj^^gT^A^^^': We first show that the measure v{}Jj(z\^g^A^^^') is concentrated around to^tt- 
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Lemma 3.13. u{Uj^[sjA^^^^) is at most (1 + 2k J s ) (2m)'' ' ^^'^^P^ with probability ^. 

Proof. We argue inductively. For k = \, the expected value of Yi = v{}Jj(z[s\A^-^^') is exactly i^. 
Moreover, by — -sparsity of j4i, . . . , A2m, and using Chernoff bounds (for negatively correlated r.v.'s), the 
deviation from the mean is at most ^/2s\og(t/^ / rn, except with probability p-. 



Inductively, the expected value of Ffc+i =^ i/(U,gr,iA(^('=+^») is exactly ^Yk < {l+Sk- /^^sIVt) 



^je[s\^ ; i:, v^^av^Li^ 2^^fe ^ y^T^rvy - J(2mj^+T 

by the induction hypothesis. A Chernoff bound argument implies that the deviation from the mean is at most 



„2 



■\/2s log{t/"f)/m'^^^, except with probability ^. The claim follows. D 

For the rest of the proof, we will assume that for all k < t, v{VJji^u-\A^^^'^) is indeed at most (1 + 

2k^[^^)j^ < 1^ for as long at t,«; are o{n-4). 

The Encoder: The encoding is simply set to be the matrix C. Note that the matrix C, along with A, which 
is part of the shared randomness, is enough to compute the answer computed by the data structure for every 
y G U,e[,]^(J*). 

In the rest of the proof, we argue that there is a good decoding algorithm that recovers most of the 6j's. 
We do so in two steps. We first argue that if Xi shatters at all levels, then the decoding algorithm succeeds 
with high probability. We then argue that in fact most Xj's must shatter at all levels. 

If every Xi shatters, the sampled cell contents are sufficient to estimate bi 's: 

We first argue that the sampling process ensures that the majority vote on the sample agrees with the 
true majority, if the appropriate shattering happens at each level. For ease of notation, for A; > 1, let K^ = 
Km^~^. Note that by Observation [321 the {K, -^)-weak shattering implies {K^, ^)-weak shattering. We 
start by defining the shattering event formally. 

Definition 3.14 (WShatter). We will say that WS hatter k+i{x, A{[s], [k]), C{[s], [k])) occurs if the collec- 
tion {Ai : i G [-s], ^ G [w-]} {Kk+i,"/'^ /t) -weakly shatters x. We use the notation WShatter{x, A, C) to 
denote the event Ak<tW S hatter k{x,A{[s], [k]), C{[s], [k])). 

When A and C are obvious from context, we will simply abbreviate these events as WS hatter k{x) and 
WShatter{x). 

Based on Observation 13. 7[ we define a sequence of measures. Let z>^ = u^. For any k < t, consider 

the collection {A^ : j G [s],l G [m]}- Let iy^^^ be the measure guaranteed by observation 13.71 so 
that i^x^^ is dominated by u^, and satisfies i>^+^(A|'' ) < j^ — . Moreover, assuming W Shatter k+i{x), 

M^j^[s]M'im]A\'''^) - '>^HUje[s],ie[2m]4'''^) is small. We set i>^+\y) = min(z>^(y), i>^i(y)) for 
every y G F. Thus v^^^ is a part of i>^ that is shattered at level {k + 1). We remark that if x was strongly 
shattered at each level, P^ = u^ for all k. 

Let Vq (resp. Vi) denote the set of vertices for which the query algorithm, given the table population, 
outputs (resp. 1). So Vj, and V;,c are the set of queries for which the query algorithm outputs the bits b and 
its complement b'^ respectively. 

The decoding algorithm works as follows: for each i, we would like to compute the majority answer 
restricted to this set L)jfz[sjA^^^\ under the measure Ur^- restricted to this set. To deny any one random choice 



15 



a large influence on the outcome, we take the majority under the measure i>* . Thus the decoder outputs 

bi = argmaxb ^ i^iiVb n A^^'''>) . 
View / 

To prove that this decoding is usually correct, we show that the measure of UjA^^''^ under i>^ remains 
close to its expectation, and that the measure of points y G UjA^^^' where the data structure returns the 
wrong answer remains small. We define two more events. 

Definition 3.15 (Rep). For k > I, we let Repk{x, A{[s], [k]), C{[s], [k — 1])) be the event 

Definition 3.16 (Small). For k > 1, we let Smallk{x, b, A([s], [k]), C{[s], [k])) denote the event 

/c7 s 



E^J(HnA(*))<(2,+ 

i6W ^ ' 

For convenience, let Smallo{x, b) denote the event iyx{Vb) = ^'^(Vb) < 27. 

By the discussion above, except with probability 7, Smallo{xi,bf) occurs for (1 — 7) of the i's. We 
assume that this is indeed the case. 

With the definitions in place, we are now ready to argue that each of these events happens for most 
Xj's. For brevity, we use Repk{x), Smallk{x, b) when the other arguments are obvious from context. It is 
immediate from the definitions that 

Lemma 3.17. If Rept{xi) and Smallt{xi, bf) occur, then the decoding bi agrees with bi. 

We argue that assuming WS hatter k{x) for each k, the events Rept{x) and Smallt{x) indeed happen 
with high probability. The following two lemmas form the base case, and the induction step of such an 
argument. 

We first argue that 



Lemma 3.18. For any x, b, 



7^ 
Pr [{Repi{x) I WShatteri{x)] > 1 - — 

A([5],l) t 

72 

Pr [Smalli{x,b)) \ SmallQ{x,b)] > 1 

A([s],l) t 



Proof. Assuming WShatteri{x), we have 

2m 2m 2 



t 

1=1 1=1 
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Since the A^^^^ are drawn at random without replacement from these 2m sets, each A^ contributes 

^j^xi^i ) to the expectation ofY = X^iefsl ^xi^^'^^'') ^^'^ this lower bounds E,[Y]. Finally, since these 
terms are negatively correlated, Chernoff bounds imply 

Y>{1-1)K[Y], 

except with probability exp(— 7^i^iE[y]/2t^), since each term in F is in (0, ■^). The claim follows. 

Next we argue Smalli{x,b), i.e. we need to upper bound X],gM i^xi^b H A^^'^^). The bound on the 
expectation follows from Smallo{x, b) and the sampling. A Chernoff bound argument identical to that for 
Repi completes the proof. D 

Similarly, we argue that 
Lemma 3.19. For any k > 1, any x, b, 

Pr [Repk+i{x) I (Repkix) A WS hatter k+iix))] > 1 - ^ 

A{[s],k+l) t 

Pr [Smallk+iix^b) I Smallk{x,b)] > 1 

A([s],A:+l) t 

Proof. To prove Repk+i{x), we use W Shatterk+i{x) and Repk{x). By weak shattering, YljeU] i ^x^^(A ) — 
z>^(Ujg[s]yl'^''^)) — ^zv(Ujg[g]A(-'''')). By disjointness of the A'-^^'^'s and using Repk{x), the first term is at 
least (1 t~)T2^^- Using lemma l3T3l the second term is at most ^ • ,^^,^, . This lower bounds the 

expectation of y = Yl,je\s\ v^^^^ {-^^^^^^^''^) since each yl-jC^+i) is chosen uniformly from A^ s. The 
choices for different j's are independent, so that Chernoff bounds imply that 

y > (1 - ^)E[y], 

except with probability exp(— 7^Erfc_|_iE[y]/2t^). A calculation identical to the previous lemma implies 
that Repk+i{x) occurs in this case. 

An identical Chernoff bound argument suffices to show Smallk+i{x, b). D 

And thus by induction. 
Lemma 3.20. For any x, b, 

Pr [RepAx) A ^WShatter(x)] > 1 - 7^ 

A{H,[t]) 

Pr [Smallt{x,b) \ Smallo{x,b)] > 1 — 7 

A(W>[*]) 

If for most Xi, WS hatter {xi) occurred, this would imply that the decoding algorithm succeeds with 
high probability. However, the event W Shatter k+i{xi) depends on the contents C([s], [k]), which are 
determined by the table population, which depends on Xi itself. 

Proving that weak shattering happens for most Xi 's: 
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The weak shattering property impUes that for any k, for a fixed table population T = Ti, . . . ,Tt (which 
determines C{[s], [k]) via A([s], [k])), 

7^ 
Pr [WShatterk+i{x)] > 1 - — , 

X'^^i t 

Thus for any fixed T and A, it is the case that 

Pi [WS hatter (x)] > 1-7^. 

X^fJ. 

Let M any S hatter {T, A, x) be the event that WShatter{x) occurs for all but a 27 fraction of the Xj's 
in the instance X, i.e. Y.^l{WShatter{xi,A{[s],[k]),C{T,A){[s],[k]))) > {I - 2j)n. 

Lemma 3.21. 

Pr [3r,A : -nManyShatter{T, k,yi)] < exp(-72n/4)) 

Proof. First consider a fixed T, A. Since the Xj's are drawn independently, Chemoff bounds imply that 

Pr [-^ManyShatter{T, K,yi)] < exp{-'y'^n/2). 

Further, note that the event M any S hatter {T, A, x) depends on T only through the contents C{T, A)([s], 
Thus doing a union bound over all possible values of C and A, 

Pr pr, A : ^M any S hatter {T, A, x)] < 2("*('"+i°g2m)) exp(-72n/2). 

2 

The claim follows since s = -m — 7,^ r, — r. D 

We assume for the rest of the proof that the database x = xi, . . . , x„ indeed has this property; this 
changes the failure probability by a negligible amount. 

Let T(x, b) be the table population built by the data structure. Lemma [3.211 implies that 

Y^ 1(WS hatter (xi)) > (1 - 2j)n. 

i 

Thus using Lemma [3. 201 

^ l{Rept{xi) A Smalltixi, 6- )) > (1 - 67)71. 

i 

Since Dec{Enc{b, z),z)i = bi whenever Rept{xi) A Smallt{xi), it follows that 

E[^ l{DeciEnc{b, z), z\ = hi)\ > (1 - 67)71. 

i 

It follows that 

E[|Dec(£;nc(6, z), z) - 5|i] < 67/1. 

The rare events ignored during the rest of the proof add an additional 0{'yn) to this expectation. 
For small enough 7, the size of the encoding stw is smaller than (7^/4)n < rn (since lim^^^o ^{^) = 1)» 
contradicting Corollary 13.91 Hence the claim. D 
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3.5 Cell Sampling 

In this section, we show a different samphng technique, which gives a different lower bound. The main 
theorem is: 

Theorem 3.22. There exists a constant 7 > such that the following holds. Let (G, e) satisfy 'j-weak 
independence (WI) and {K, ^, ^)-weak shattering property. Then for t < n'i, any deterministic t-probe 
data structure for the distribution over GNS instances defined by {G, e) that succeeds with probability (1 —7) 
must use space m at least Q{'y^nK^ /wt'^ log(t/7)). 

Proof. The proof is analogous to that for theorem [3. 121 We define Encoding and Decoding procedures that 
compress a random string. The primary difference is in the sampling procedure; instead of sampling "paths" 
as in the previous section, we sample cells from each table, and argue that the set of points that can be looked 
up only using paths in the sample is sufficient to recover the bits 6j. Specifically, we define events analogous 
to Rep and Small in the previous section, and show that they occur for many points. 

We assume the contrary so that for m < ^^nK^ /wt^ log(t/7)), there is t-probe space m data structure 
that succeeds with probability 1 — 7^^ on the distribution defined by {G,fi, {iyx}xev)- We use this data 
structure to construct functions Enc and Dec violating corollary 13.91 We use the auxiliary input as shared 
randomness between Enc and Dec throughout this proof. The database {xi, 61), ... , (x„, &„) is defined as 
before. 

Cell Sampling Procedure: Let s = 8t'^mlog{t/^)/^'^K2i . Let Fi = Fi{y), F2 = ^2(1/, ai), F3 = 
Fsiy, ai, 02) etc. be adaptive lookup functions that the data structure uses. The sampling is done in t steps 
one for each table. At each step we will get subsets V = A^ D A^ D A^, . . . ,A^ so that all the queries in 
A^ only access the sampled cells for the first i lookups. 

Let a'-' = V and let A'l for / G [2m] be a ^-sparse partition obtained by refining {F^^ (/) : / G [m]}; 
this can be done as before by arbitrarily breaking up cells larger than — . Let An, . . . , A^i be a random 
subset of [2m] of size s. Let the contents of the respective cells in a table population T be denoted by 
Cii, . . . , Csi. Thus the sample from the first table consists of s rows An, . . . , Asi G [27n] whose contents 
are Cu, . . . C,i G {0, 1}'". We set A^ = Uje[s]Ai^,- 

Let Aj : I E [2m] be a —-sparse partitioning obtained by refining {A^ D F2^{1) : I G [m]} as above. 
We pick a random subset A12, . . . , As2 of [2m] of size s, and let C12, • • • , Cs2 denote the relevant set of 
contents from T. We set A"^ = Uj(z[s]A\ be the set of queries y ^ V that look up one of the sampled cells 
in the first two tables. 

Repeating this process, we get sxt matrices A and C, and sets Ai, . . . ,At. Note that in any execution of 
the procedure, the set A'^ depends only on the samples A and the contents C read from the table population. 

The measure of A^. We first show that the measure v{A'') is concentrated around ' ^-^^ 



■.2m' 



Lemma 3.23. i/(Ujg[5]yl'') is at most (1 + 2A:v/ ^^y^-' )(^)^, except with probability 



2 



t^ 



def 



Proof We argue inductively. For A; = 1, the expected value of Yi = i^{A^) is exactly ^. Moreover, by 
— -sparsity of Ai, . . . , A2m, and using Chernoff bounds (for negatively correlated r.v.'s), the deviation from 
the mean is at most ^/2s]og{t/J)/rn, except with probability ^ . 

Inductively, the expected value of y^+i =^ J^(^^+^) is exactly ^Yk < (1 + 2k^f^^^^){^)''+'^ by 
the induction hypothesis. A Chernoff bound argument identical to the one above completes the proof. D 
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For the rest of the proof , we will assume that for all /c < t, u{A'') is indeed at most {l+2ky g )i^)'' < 

3 ( s \k 
2\2m/ • 

The Encoder: The encoding is set as before to be the matrix C. Note that the matrix C, along with A, which 
is part of the shared randomness, is enough to compute the answer computed by the data structure for every 

y G A*. 

If every Xi shatters, the sampled cell contents are sufficient to estimate hi 's: 

We argue that the sampling process ensures that the majority vote on the sample agrees with the true 
majority, if the appropriate shattering happens at each level. We start by defining the shattering event for- 
mally. 

Definition 3.24 (WShatter). We will say that W Shatter k^i{x, A([s], [k]),C{[s\, [k])) occurs if the collec- 
tion {Ai : I £ [fn]} {K,'^'^ /t) -weakly shatters x. We use the notation W S hatter {x, A, C) to denote the 
event Ak<tWShatterk{x,A{[s], [k]),C{[s], [k])). 

When A and C are obvious from context, we will simply abbreviate these events as WS hatter ^ (x) and 
WShatter{x). 

We once again define a sequence of measures. Let i>^ = v^. For any k < t, consider the collection 
{A^, I S [m]}. Let iy^^^ be the measure guaranteed by observation l3.7l so that i)^^^ is dominated by u^, and 
satisfies i>!^~^^{A^) < ^. Moreover, assuming W Shatter kj^i{x), Vx{UiAf) — i)^'^^{}JiA'l) is small. We set 
^x^^{y) — ™iii(^x(y)) ^x^^{y))- We remark that if x was strongly shattered at each level, 9^ = v^ for all 
k. 

Let Vb (resp. V\) denote the set of vertices for which the query algorithm, given the table population, 
outputs (resp. 1). So V\, and V\,<^ are the set of queries for which the query algorithm outputs the bits h and 
its complement \f respectively. 

The decoding algorithm works as follows: for each i, we would like to compute the majority answer 
restricted to this set Al , under the measure z/^. restricted to this set. Instead we take the majority under the 
measure v\. . Thus the decoder outputs 

hi = argmaxb {i^iiVb H vl*)) . 

To argue that the decoding is correct for most Xi 's, we show that the measure of A^ under i/'^ remains 
close to its expectation, and that the measure of points y £ A'^ where the data structure returns the wrong 
answer remains small. We define two more events. 

Definition 3.25 (Rep). For k > 1, we let Repk{x, A{[s],[k]),C{[s],[k — 1])) be the event 

t 2m 
Definition 3.26 (Small). For k > 1, we let Smallk{x, b, A{[s], [k]),C{[s], [k])) denote the event 

i>'AVbnA^)<{2^ + ^){^)^ 

For convenience, let Smallo{x, b) denote the event UxiVb) = ^'^(Vf,) < 27. 
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As in the path sampling case, except with probabiUty 7, Smallo{xi, hi) occurs for (1 — 7) of the i's. We 
assume that this is indeed the case. 

With the definitions in place, we are now ready to argue that each of these events happens for most 
Xj's. For brevity, we use Repk{x), Smallk{x, b) when the other arguments are obvious from context. It is 
immediate from the definitions that 

Lemma 3.27. If Rept{xi) and SmaUt{xi, bf) occur, then the decoding bi agrees with bi. 

We argue that assuming W Shatter k{x) for each k, the events Rept{x) and Smallt{x) indeed happen 
with high probability. The following two lemmas form the base case, and the induction step of such an 
argument. 

We first argue that 

Lemma 3.28. For any x, b, 

Pr [{Repi{x) I WShattenix)] > 1 - — 

A([s],l) t 

72 

Pr [Smalli{x,b)) \ Smallo{x,b)] > 1 

A{[s],l) t 

Proof. Assuming WS hatter i{x), we have 

2m 2m 2 

1=1 1=1 

Since we draw s sets at random without replacement from these 2m sets, each A^* contributes ^i^x (Af) 

to the expectation of y = I'xiA^) = ^lefsl '^i(^A i) '^^^ ^^^ lower bounds E,[Y]. Finally, since these 
terms are negatively correlated, Chernoff bounds imply 

y > (1 - |)E[y], 

2 

except with probability exp{—^K{-^)). The claim follows by an easy calculation. 

Next we argue Smalli{x,b), i.e. we need to upper bound J2je\s] ^xi^b H A^). The bound on the 
expectation follows from Smallo{x, b) and the sampling. A Chernoff bound argument identical to that for 
Repi completes the proof. D 

Similarly, we argue that 
Lemma 3.29. For any k > 1, any x, b, 

Pr [Repk+i{x) I (Repkix) A WS hatter k+iix))] > 1 - ^ 

A([s],fe+1) t 

72 

Pr [Smallk+i(x,b) I Smallk(x,b)] > 1 

A([s],fc+1) t 
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Proof. To prove Repk+i{x), we use W Shatter k+i{x) and Repk{x). By weak shattering, X];ef2ml ^x~^^\^i 
D^{A^) - ^u{A^). By Repk{x), the first term is at least (1 - ^){^)''- Using lemma [l23 the 



(^f)> 



second term is at most ^ • |(2^)'^- This lower bounds the expectation of Y = t'^+^(^(^+^)) = 
S?efsl ^x^^(^A ) since A,fc+i is a uniformly random subset of [2m] of size s. Chemoff bounds then 
imply that 

y>(l_2)E[y], 

2 

except with probability exp(— ^i^E[y]). A calculation similar to the previous lemma implies that Repk+i{x) 
occurs in this case. 

An identical Chemoff bound argument suffices to show Smallk+i{x, b). D 

And thus by induction, 
Lemma 3.30. For any x, b, 

Pr [Rept{x) A ^WShatter{x)] > 1 - 7^ 

Mls],[t]) 

Pr [Smallt(x,b) I Smallo(x,b)] > 1 — 7 

A(W,[<]) 

If for most Xi, for all k, W Shatter k{x) occurred, this would imply that the decoding algorithm succeeds 
with high probability. 

The rest of the proof is identical to that for Theorem 13. 12[ since once again, the event W Shatter {x) 
depends on the table population only through the contents C. 

D 

4 Applications 

We show how lower bounds on GNS imply lower bounds for ANNS. We stress that these bounds hold for 
the average case where the n data-set points are sampled randomly from a distribution over \V\. Thus, if 
with high probability the distance between all pairs of points in the data set is at least cr, then the bounds 
above hold also for the approximate nearest neighbor within factor c. The following table lists all these 
bounds and how they follow from our work. 



metric space 


appr 


det / rand 


bound 


ref 


Thm 


h 


lA 


det 


t > de'^/log{mwd/n) 


111,111 


1.4f 


h 


l + e 


rand 


i 
mw > ri^ 





11.51 


h 


lA 


rand 


mw >n~^'i 


HI 


11.51 


loo 


logp log d 


det 


mw > nP/^ 


m 


|1.4| 



Table 1: Known lower bounds, and how they follow from Theorems 1 1.41 andl 1.51 
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4.1 GNS to ANNS 

In the decisional version of the (c, r)-ANNS problem we have a metric space A4 and parameters c and r. 
We preprocess n points xi, ..., x„ into a data structure. When given a query point y the goal is to distinguish 
between the case where d{xi,y) < r for some i G [n], and the case where for all i d{xi,y) > cr. The query 
algorithm is required to output 1 in the former case, in the latter case, and may report anything if neither 
of the two cases hold. 

We show that if we have appropriate distributions over TW, we can derive lower bounds for (c, r)-ANNS 
by simply computing the relevant expansion parameter. 

Theorem 4.1 (GNS to ANNS Deterministic). Let c > 1 and let fibea distribution over a metric M^ = {V,d) 
satisfying: 

ic-Strong Independence) Pr \d{x,z) < (c + l)r] < ^. 

Let Gf = (y, {(u, v) : d{u, v) < r}). Then GNS on {Gr, IJ-) reduces to (c, r)-approximate GNS on Ai. 

Proof. Given a GNS instance (xi, 6i), . . . , (a;„, 6„), we consider the dataset Di = {xi : bi = 1} as our 
input for the (c, r)-ANNS problem. It is easy to see that when xi ^ ji and bi ~ {0, 1}, this set Di is a 
uniformly random dataset from fi. The c-Strong independence implies strong independence for the GNS 
instance. Whenever d{xi,Xj) > (c + l)r, and d{xi,y) < r, we have d{xj,y) > cr so that for the c- 
approximate NNS instance, the answer is 1 if and only Xi is in Di, i.e. if and only if bi = 1. The claim 

follows. n 

Thus to prove deterministic data structure lower bounds for c-approximate NNS, it suffices to exhibit r 
and a distribution fi which satisfies c-strong independence and has large expansion. 
Similarly 

Theorem 4.2 (GNS to ANNS Randomized). Let c > 1 and let e be a distribution over pairs of points in a 

metric Ai = {V, d). Let fi{x) = e{x, V) and y{y) = e{V, y). Suppose that for small enough 7 

{c-Weak Independence) Pr [d{y,z) <cr]<—, 



and 



Pr [d{x, y) < r] > 1 — 7. 

(a;,j/)~e 



Then GNS on (G, e) reduces to (c, r) -approximate GNS on Ai. 

Proof. As before, given a GNS instance {xi, bi), . . . , (x„, 6„)> we consider the dataset Di = {xi : 6j = 1} 
as our input for the (c, r)-ANNS problem. It is easy to see that when Xi ^ ^i and bi ^ {0, 1}, this set Di is 
a uniformly random dataset from ^. The properties above imply weak independence for the GNS instance. 
Finally, except with small probability, we have d{xj,y) > cr for all j / i, so that for the c-approximate 
NNS instance, the answer is 1 if and only Xi is in Di, i.e. if and only if bi = 1. The claim follows. D 

Thus to prove randomized data structure lower bounds for c-approximate NNS, it suffices to exhibit r 
and a distribution e which satisfies the above properties and has large expansion. 
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4.2 Computing Expansion 

Next we bound the expansion of the hypercube for appropriate distributions, which would imply the claimed 
lower bounds for the Hypercube. The vertex expansion result for i^o in iSl will lead to the cell probe lower 
bounds for l^o proved in their work. 

We will set ji to be the uniform distribution over the hypercube. For a set A C H, we let a = ^{A). 
Observe that if we take n uniformly random points from a d dimensional hypercube then with high proba- 
bility all pairs of points are at least d/2 — 0{\/dlogn) apart. Using bounds for the expansion for the (Gr 
corresponding to the) d-dimensional hypercube, we will derive lower bounds for the near neighbor problem 
on the hypercube. 

The following lemma is proved in 161 

Lemma 4.3 (Vertex Expansion of Hypercube). Let H = {0, l}'^ be the boolean hypercube, and let Gr be 
the graph with the edge set Er = {{u,v) : \u — v\i < r}. Let hi = -^ Si=o (i)- ^^^" ^^^ vertex expansion 

Setting r = d/3 we see that the node expansion $^ > h^/2/hd/2-d/3 = 2^^'^). From theorem \TA\ 
we get {mwt/n)'^ > 2^^'^) or i > d/log{mwd/n). This gives us the deterministic lower bound for 
2— approximation in ti norm. Setting r = f-d/l gives us an a lower bound for 0(e)— approximation. 
For this value of r, <l>j, > /id/2/^(i-e)d/2 = ^^('^ '^\ So we get the bound of t > e^d/log{mwd/n). 

For randomized lower bounds, we will use the distribution defined by the noise operator Tp where 
p = (1 — ^). I.e. to sample from e, we sample x from the uniform distribution /i and sample y by flipping 
each bit of x independently with probability ^ 1^^' = |. It is easy to check that for any r < (^ — e)d and d 
being il(log n/e^), we indeed have Pr/^ j^)^g[(i(x, y) < (1 + jQ)r] > 1 — ^. Moreover, since p = zv, by the 
discussion above, Fiy(zi,^z&fj.[d{y, z) < ^ — 0{^dlogn)] < ^. Thus it remains to compute the expansion 
for appropriate r. 

It will be convenient to work with the edge expansion. 

Definition 4.4. We define the edge expansion <^e{S) for a (G, e) as ^e{5) = "^^™M(^)<5e7XAT" ^^^^ f^^ 
any set of size measure 6, at most ^ ,^. mass of edges incident on A stay within A. 

Observation 4.5. For any (G, e), if ^ is uniform, then $,.((5, 7) = i7(7<l>e(25)) 

Proof First we will argue that for any sets A and B where ^{A) = n{B) < 5, e{A, B) < 2e{A, y)/$e(2(5). 
To see this note that e{A, B) <e{AuB,AuB)< ^^(^(aub)) '^(^ UB,V) = ^^2e{A, V) 

Now consider any set A of measure at most S, and let B by any other set of measure 57<I>e(2(5)/2. We 
wish to argue that e{A, B) < ^e{A, V) which would imply the claim. 

Let i3i ,..., Sfc be a partition of 5 into /c = \—^^ — ^] pieces of measure 5 each. e{A,B) < ^ie{A,Bi) < 
T ^ 7- The claim follows. 



#e{M(AUS)) 

D 



Lemma 4.6 (Edge expansion of Hypercube). Let H = {0, l}'^ be the boolean hypercube, and {G, e) be as 
above for r < j. Then the edge expansion <I*e(a) > a"^'"'''''^'. 

Proof For sets A, B, it is easy to see that e{A, B) = (Tpl^, 1b). But by the Hypercontractive inequality. 



(TplA,!^) = {T^1a,T^1a) = \\T^1a\\1 < |1a| 



2 
i+p 



2 
ai+p. 



Also e{A, V) = a. The claim follows by substituting the value of p. D 
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Setting r = ed/2, we get that for constant 7, ^rio-, 7) > a ^^'^\ From theorem ll.5l (first inequaUty) it 
follows that {mw/nY > m~^*^^^ implying m > (n/tt;)^+^*^^/*\ 

Lemma 4.7 (Robust expansion of Hypercube). Let H = {0, l}'^ be the boolean hypercube, and let (G, e) 
be as above for r = (^ — e)d Then for any sets A and B such that ii{B) < fi{A) ^ ~~d' , e{A,B) < 

/i(A)(i-ir)'de(An 

Proof. As in the proof of lemma |46l we use the hypercontractive inequality. 

e{A,B) = {TpUAb) 

^ llT"' "1 II II 1 II 

< ll-tplAlblllBlb 

^ ||1a||i+p2||1b||2 
= a^+p b'i 

= af'e{A,V). 
The claim follows. D 

Corollary 4.8. For any (3>0,andr< |, setting p = 1 - |l, $„„ (/3, /Jp' ) > jS^-'^P^ for Gr as above. 
Setting r = (1 — e)d/2, we see that from theorem [T3] that any randomized algorithm must use space 

The vertex expansion result for £00 has already been computed in the work by fH for proving cell probe 
lower bounds for i^a- They consider the d-dimensional grid {0, 1, ... , ra\'^ with a non uniform measure 
defined as follows. The measure tt over {0, 1, . . . ,?n,} is defined by 7r(i) = 2^^^'')' for all i > and 
7r(0) = 1 — X^j>o ^(*)- ^'^^ '^^'^ defines /^^(xi, X2, • • • , x^) = vr(xi) • vr(x2) . . . T^{xd). For this measure 
fi^ the following expansion theorem was shown in [1]. 

Lemma 4.9. [1 / fid{N{A)) > HdiA^^P 

Now it is easy to show that if n random points are chosen from the measure fi^i. then every point is at 
least g = log2p(| logd) away from the origin under the £00 norm. This is because the probability that a 
certain coordinate of a point is at least ^{g) is at least l/d*^. So probability that no coordinate is more than 
g is at most (1 — 1/d'^/^)'^ < e"'^ " . For d = r2(log^"'"'^ n) this is at most l/n^(^). So all points will be 
at least distance g from the origin. In fact this argument also easily shows that they are in g distance from 
each other. So setting r = 1 gives us a lower bound for {g — \, r)— ANNS for £00 ■ From Theorem 1 1.41 it 
follows that to get a 0(log„ log d) approximation for NNS on £00 the amount of space required is at least 

5 A Matching Upper Bound 

We already know that the lower bound is tight in many specific cases. Here we show that the tightness holds 
more generally, for highly symmetric graphs. For such graphs, we show that the notion of robust expansion 
correctly captures the complexity of GNS for the regime with a constant number of queries. 
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Let G be an undirected Cayley grapho and assume that it has the weak independence property for the 
uniform distribution. Let m be such that m = n<I>j.(^, 7) (denoted by $ for brevity) where 7 > f ■ Below 
we describe a data structure with m cells and word size O(logn) which can solve GNS in a single query 
with constant probability for the hard distribution of inputs. This matches the lower bound of Theorem 1 1.5 1 
for the case t = 1. 

Theorem 5.1. Let G be an undirected Cayley graph that has the weak independence property for the uniform 
distribution. Let m be such that m = n<^r-(m ' t) (denoted by (^ for brevity) where 7 > |. Then there is a 
1-probe data structure that uses m words of w = log |G| bits each that succeeds with constant probability 
when the data-set points xi , . . . , x„ are drawn randomly and independently, and the query point is a random 
neighbor of a random Xj. 

We observe that this is the distribution for which we show the lower bound. 

Proof. The main idea is to use the low expanding sets in order to construct something similar to a Locality 
Sensitive Hashing solution. We stress that the upper bound is in the cell probe model, which allows us to 
ignore the (practically very important) issue of actually computing the LSH efficiently. 

Let ^ C F be a set of measure 1/m for which the robust expansion is <1> and m = n<I>. By the definition 
of robust expansion, we know that there is a set B of measure <I>/m, such that \E{A, B)\ > ^\E{A, V)\. 
We take m random translations of A and B, denoted by Ai, . . . , Am and i?i, . . . , B„i, formally, we sample 
uniformly m elements oi,...,am from the group underlying the Cayley graph, and set Ai = {u : u = 
ai-\- v,v ^ A] and similarly Bi = {u : u = ai + v ,v ^ B}. The translation by Cj is an automorphism that 
maps ^ to ^i and S to Bi so for each i \E(Ai,Bi)\ > -f\E{Ai,V)\. 

We construct a table T with m cells as follows: Given a data set point x, we check for each i < m 
whether x ^ Bi, and if so we place x in Tj. Note that the measure of each Bi is ^/m so that the expected 
number of data set points Xj that fall in Bi is n^/m which is 1. For random data sets, most Sj's will 
contain at most (say) 10 data-set points. In order to keep the word size small we store at most 10 data set 
points in each table cell Tj, and assuming that O(logn) bits suffice to represent a data-set point, we have 
w = O(logn). 

Now, given a query point y we find an i for which y ^ Ai and output the data-set point in T[i] which is 
closest to y. Note that with constant probability, such an i exists and is unique. 

Recall that the data set xi, . . . , x„ is obtained by sampling n points uniformly and independently from 
V. Further, we assume that this distribution is weakly independent; i.e. if x and y are random nodes and z 
is a random neighbor of x, then Pi[z G N{y)] < 1/lOOn. Further, the query point is obtained by sampling 
a random neighbor of a random data set point. Assume that the correct answer is x. Now if y G Ai (which 
happens with a constant probability), then the lookup succeeds if x G Bi and there were less than 10 data set 
points in Bi. The first event occurs with probability 7 > | and the second event occurs independently with 
probability at least | as well. We conclude that the data structure succeeds with constant probability. D 

6 Low Contention Dynamic Data Structures 

Let Qi denote the set of queries that read cell T[l] from the table. 

Definition 6.1. A data structure is said to have contention t ifi'{Qi) < t for all I. 



^We need the graph to be highly symmetric and the symmetries of Cayley graphs are convenient for the claims we need. 
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Say we insert a new point x. After the insertion there would be a subset of N'{x) C N{x) such that the 
query algorithm outputs the correct answer for any y G N'{x). We say the insertion x is successful if the 
measure of this set under v^ is at least 1 — 7. 

We remark that we look at contention only for the part of the data structure that depends on the database; 
accesses to any randomness are free. 

We prove Theorem ! 1 .81 which states that in such data structures, the update time is at least J7(<l>r(T, 0{^))/t'^) 



Proof of Theorem 17751 Consider the state of a t-probe data structure just before we insert a point. The t 
lookup functions each give a partitioning of V such that each part in the partitioning has measure at most r 
under v. Let Q\, . . . , Q\^ be the ith partitioning. 

Let L be the set of locations in T that are updated when one inserts x. Thus \L\ <tjj. Further, note that 
for the answer for y to be correct both before and after the insertion, L must intersect with at least one of 
the locations that are queried on y, i.e. y G Q^ for some i. 

By assumption each of the i partitions is r-sparse. Lemma [3.111 implies that except with small proba- 
bihty, X is {K,^/2t) shattered where K = ^r{T, ^)7^/16i^ by each of the partitions. Strong shattering 
would imply that for each /, the measure i^xiQl) — "^ ^° ^^^^ 1^1 changes can account for at most a t\L\/K 
measure being affected, which would imply the result. 

Since we only have weak shattering, we recall that shattering implies that for any i, there a measure P^ 

such that Di{V) > (1 - ^) and i>i{QJ) < ^ for any l. Let £>^.(y) =^ min^ P^(y). Thus D^iV) > (1 - ^). 



Finally, 



1-7 < M^iQi) 



< 



t 

7 



+Y.'^-(Ql) 



i=l 



< l + t\L\/K 
Thus tu > \L\ > K/2t = ^r{r, ^)'y^/32t'^. D 
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